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Abstract 

Software  reuse  is  only  effective  if  it  is  easier  to  locate  (and  appropriately  modify)  a  reusable  component 
than  to  write  it  from  scratch.  We  present  signature  matching  as  a  method  for  achieving  this  goal  by  using 
signature  information  easily  derived  from  the  component.  We  consider  two  kinds  of  software  components, 
functions  and  modules,  and  hence  two  kinds  of  matching,  function  matching  and  module  matching.  The 
signature  of  a  function  is  simply  its  type;  the  signature  of  a  module  is  a  multiset  of  user-defined  types  and 
a  multiset  of  function  signatures.  For  both  functions  and  modules,  we  consider  not  just  exact  match,  but 
also  various  flavors  of  relaxed  match.  We  briefly  describe  an  experimental  facility  written  in  Standard  ML 
for  performing  signature  matching  over  a  library  of  ML  functions. 
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1.  What  is  Signature  Matching? 


Software  reuse  sounds  like  a  good  idea.  It  promises  advantages  like  reducing  the  time  and  cost  spent 
on  programming,  increasing  programmers’  productivity,  and  increasing  program  reliability  [BP89,  AM87, 
IEE84,  Pre87].  But  why  doesn’t  it  work  in  practice?  One  reason  is  that  it  is  hard  to  find  things  As  libraries 
of  software  components  get  larger,  this  problem  will  get  worse.  Reuse  is  only  worth  it  if  it  is  easier  to  locate 
(and  appropriately  modify)  a  reusable  component  than  to  write  it  from  scratch. 

Today,  if  we  want  to  find  some  desired  component,  we  could  use  the  component’s  name — if  we  are  lucky 
enough  to  know,  remember,  or  guess  it.  We  could  browse  through  the  library  itself,  or  perhaps  an  index 
into  the  library  (for  example,  as  with  a  Smalltalk  browser).  Given  that  the  components  over  which  we  are 
searching  are  program  units  (e.g.,  Pascal  procedures,  C  functions,  Ada  packages,  C++  or  Smalltalk  classes, 
or  Modul»-3  or  ML  modules),  then  we  have  another  means  for  retrieval:  signature  matching.  This  paper 
presents  the  foundations  for  what  signature  matching  means  and  briefly  describes  a  signature  matching 
facility  we  have  built  and  integrated  into  our  local  ML  programming  environment. 

To  illustrate  our  ideas  here  and  for  the  rest  of  this  paper,  consider  the  small  library  of  components  in 
Figure  1.  It  contains  three  ML  signature  modules,  LIST,  QUEUE,  and  SET,  which  together  define  seventeen 
functions,  e.g.,  mpty  and  cons.1 


signature  LIST  = 
sig 

val  empty  :  unit  — *  a  list 
val  cons  :  («,  a  list)  — ►  a  list 
val  hd  :  o  list  — »  a 
val  tl  :  a  list  — ►  a  list 
val  map  :  (a  —*/?)—*  a  list  — 
val  intsort  :  (int,  int  -♦  bool) 
end; 


(3  list 
-*  int  list 


int  list 


signature  QUEUE  = 
sig 

type  a  T 

val  create  :  unit  — >  a  T 
val  enq  :  (a,  a  T)  — *  a  T 
val  deq  :  a  T  — ♦  (a,  a  T) 
val  len  :  a  T  —  int 
end; 


signature  SET  = 
sig 

type  a  T 

val  create  :  unit  — *  a  T 
val  insert  :  (a  T,  a)  —  a  T 
val  delete  :  (a  T,  o)  —  a  T 
val  member  :  (a,  a  T)  — < >  bool 
val  union  :  (a  T,  a  T)  — • ►  a  T 
val  intersection  :  (o  T,  a  T)  —  a  T 
val  difference  :  (a  T,  a  T)  — -  a  T 


Figure  1:  Three  ML  (Signature)  Modules 

If  we  are  looking  for  a  specific  function,  rather  than  perform  a  query  based  on  its  name,  we  could 
perform  a  query  based  on  the  function’s  type,  which  is  the  list,  of  types  of  its  input  and  output  parameters 


1  ML  «jnat*re  module*  are  akin  to  Ada  definition  modules  and 
in  module*  c.id'-d  tinctures.  [MTH90] 


Moduia-3  interface  modules;  ML  implementations  are  written 
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(and  possibly  information  about  what  exceptions  may  be  signaled).  For  example,  a  list  — »  a  is  the  type  of 
the  function  ltd.  If  we  are  looking  for  a  module,  we  could  perform  a  query  based  on  its  in terface,  which  is  a 
multiset  of  user-defined  types  and  a  multiset  of  function  types.  For  example,  SET  has  one  user-defined  type, 
a  T,  and  seven  function  types.  In  practice,  a  library  of  software  components  is  usually  a  set  of  program 
modules;  we  cam  construct  a  function  library  from  a  module  library  by  extracting  all  functions  from  each 
module  in  the  obvious  way. 

Signature  matching  is  the  process  of  determining  when  a  library  component  “matches”  a  query.  We  can 
reasonably  assume  that  signature  information  is  either  provided  with  or  derivable  from  code  components, 
since  this  information  is  typically  required  by  the  compiler.  As  with  other  information  retrieval  methods, 
requiring  a  component  to  match  a  query  exactly  will  sometimes  be  too  strong.  There  may  be  a  component 
that  does  not  match  exactly,  but  is  similar  in  some  way  and  hence  would  match  a  query  if  the  component 
(or  query)  is  slightly  modified.  Thus,  in  addition  to  exact  match,  we  also  consider  cases  of  relaxed  matches 
between  a  query  and  a  library  component.  The  expectation  is  that  relaxed  matching  returns  components 
that  are  “close  enough”  to  be  useful  to  the  software  developer.  For  example,  relaxed  matching  on  functions 
might  allow  reordering  of  a  library  function's  input  parameters;  relaxed  matching  on  modules  might  require 
only  a  subset  of  the  library  module’s  functions. 

We  define  signature  matching  in  its  most  general  form  as  follows. 

Definition.  Signature  Match.  Query  Signature,  Match  Predicate,  Component  Library  —  Set  of  Components 


Signature  Match(q,  M.C)  =  {c  6  C  :  .V/(c,  g)} 

In  other  words,  given  a  query,  q,  a  match  predicate,  M ,  and  a  library  of  components.  C,  signature  matching 
returns  a  set  of  components,  each  oi  which  satisfies  the  match  predicate.  This  paper  explores  the  design 
space  of  signature  matching:  we  consider  two  kinds  of  library  components,  functions  and  modules,  and 
hence  consider  two  kinds  of  signature  match,  function  (type)  match  and  module  (interface)  match.  We  are 
interested  in  both  levels  of  signature  match  because  in  practice  we  expect  users  to  retrieve  at  different  levels 
of  granularity.  We  also  consider  different  kinds  of  match  predicates:  exact  match  and  various  relaxed  matches 
(for  both  function  and  module  match). 

In  a  broader  context,  signature  matching  can  be  viewed  as  another  instance  of  using  domain-specific 
information  to  aid  in  the  search  process.  Knowing  that  we  are  searching  program  modules  as  opposed 
to  uninterpreted  Unix  files  or  SQL  database  records  lets  us  exploit  the  structure  and  meaning  of  these 
components.  Using  domain-specific  information  is  an  idea  applicable  to  other  large  information  databases, 
e.g.,  the  nationwide  Library  of  Congress,  law  briefs,  police  records,  geological  maps,  and  may  prove  to  be 
key  in  grappling  with  the  problem  of  scale. 

By  not  requiring  users  to  know  the  name  (or  unique  identifier)  of  what  they  are  searching  for,  we  can 
also  view  signature  matching  as  an  example  of  content-addressable  search.  For  example,  users  formulate 
queries  in  terms  of  key-value  pairs  to  retrieve  records  from  a  relational  database.  Hence,  the  intended  pun 
in  our  title:  signature  matching  is  a  “key”  to  software  reuse. 

Signature  matching  is  useful  not  only  to  retrieve  components.  Software  developers  might  use  a  signature 
matcher  to  filter  out  the  bulk  of  library  components  from  further  consideration.  They  can  also  use  signature 
matchers  to  browse  a  software  library  in  a  structured  way,  e  g.,  by  exploiting  the  partial  ordering  induced  by 
function  types.  We  view  signature  matching  as  complementing  standard  search  and  browsing  facilities,  e.g., 
grsp  and  Is,  which  provide  a  primitive  means  of  accomplishing  the  same  goals.  A  tool  that  does  signature 
matching  is  just  one  of  many  in  a  software  developer's  environment.  Using  a  signature  matcher  should  be 
just  as  easy  to  use  as  doing  a  search  on  a  string  pattern. 

We  define  module  match  in  terms  of  function  match.  So  we  begin  at  the  lowest  level  of  granularity  in 
Section  2  by  defining  exact  match  and  several  relaxed  matches  for  functions.  Section  3  defines  module  match 
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and  its  relaxations.  In  Section  4  we  describe  our  signature  matching  facility  which  we  have  used  in  searching 
over  a  collection  of  Standard  ML  modules.  We  compare  our  work  with  other  approaches  in  Section  5  and 
close  with  a  summary  and  suggestions  for  future  work  in  Section  6. 


2.  Function  Matching 


Function  matching  based  on  just  signature  information  boils  down  to  type  matching,  in  particular  matching 
function  types.  The  following  definition  of  types  is  based  on  Field  and  Harrison[FH88]  A  type  is  either 
a  type  variable  (6  TypeVar ,  denoted  by  Greek  letters)  or  a  type  operator  applied  to  other  types.  Type 
operators  ( TypeOp )  are  either  built-in  operators  (ButlilnOp)  or  user-defined  operators  ( UserOp).  Each  type 
operator  has  an  ant*  indicating  the  number  of  type  arguments  Rase  types  are  operators  of  O-arity,  e  g  , 
inf,  bool;  the  “arrow”  constructor  for  function  types  is  binary,  e.g.,  ini  —  boot.  We  use  infix  notation  far 
tuple  construction  (,)  and  functions  (— >),  and  otherwise  use  postfix  notation  for  type  operators  (e  g.,  int  list 
stands  for  the  “list  of  integers”  type).  The  user-defined  type,  a  T ,  represents  a  type  operator  T  with  anty 
1,  where  the  type  of  the  argument  to  T  is  a  2  In  general,  when  we  refer  to  type  operators,  they  can  be 
either  built-in  or  user-defined  (i.e..  ButltlnOp  U  UserOp).  Two  types  r  and  r1  are  equal  (r  =  r')  if  either 
they  are  the  same  type  variable,  or  r  =  typeOp(rlt  ...,rn),  P  =  iypeOp' {r[ r^) ,  iypeOp  =  iypeOp1 ,  and 
V  1  <  i  <  n,Ti  =  t-.  Polymorphic  types  contain  at  least  one  type  variable;  types  that  do  not  contain  any 
type  variables  are  monomorphic. 

To  allow  substitution  of  other  types  for  type  variables,  we  introduce  notation  for  variable  substitution: 
[r,/a]r  represents  the  type  that  results  from  replacing  all  occurrences  of  the  type  variable  Q  in  r  with  r‘ . 
provided  no  variables  in  r'  occur  in  r  (read  as  V  replaces  a  in  r”).  For  example.  {(inf  —  i nt)/0](a  —  3) 
=  a  — »  (tnf  — ►  in t).  A  sequence  of  substitutions  is  right  associative.  For  example,  [P/l][o */3\{3  —  7)  — 

[Ph](<*  —  7)  =  (a  —  $)■ 

In  the  case  where  r'  is  just  a  variable,  [r'/arjr  is  simply  vanable  renaming.  For  variable  renaming,  a,  P  £ 
Type  Var  or  a,  r'  €  UserOp.  We  think  of  user-defined  type  operator  names  as  variables  for  the  purposes  of 
renaming,  since  different  users  may  use  a  different  name  for  the  same  type  operator.  Renaming  sequences 
may  include  both  type  variable  renaming  and  user-defined  type  operator  renaming.  (/3/a)(C/T](o  T  —  a)  = 

C  — •  /?).  We  will  use  V  for  a  sequence  of  variable  renamings  and  U  for  a  sequence  of  more  general 
substitutions. 

Given  the  type  of  a  function  from  a  component  library,  n,  and  the  type  of  query,  r, ,  we  define  a  genenc 
form  of  function  match ,  Af(r),r,),  as  follows: 

Definition.  ( Genenc  Function  Match)  M .  Library  Type,  Query  Type  — *  Boolean 


Af(n,r7)  =  Tj(tj)  R  T?(rf) 


where  Ti  and  T,  are  transformations  (e.g.,  reordering)  and  R  is  some  relationship  between  types  (e.g., 
equality).  Most  of  the  matches  we  define  apply  transformations  to  only  one  of  the  types.  Where  possible, 
we  apply  the  transformation  to  the  library  type,  n,  in  which  case  T,  is  simply  the  identity  function.  For 
example,  in  exact  match,  two  types  match  if  they  are  equal  modulo  variable  renaming.  In  this  case.  7}  is  a 
sequence  of  variable  renamings,  Tq  is  the  identity  function,  and  R  is  the  type  equality  (=)  relation. 

We  classify  relaxed  function  matches  as  either  partial  matches,  which  vary  R.  the  relationship  between 
r,  and  r,  (e.g.,  define  R  to  be  a  partial  order),  or  transformation  matches,  which  vary  7}  or  I the  trans¬ 
formations  on  types.  In  the  following  subsections,  we  first  define  exact  match,  followed  by  partial  matches, 


1  We  deviate  from  ML't  convention  of  using  *  for  tuple  construction;  the  comma  is  easier  on  the  eyes.  Also,  in  ML.  the 
common  programming  practice  is  to  use  T  for  the  operator  name  of  the  user-defined  type  of  interest. 


3 


transformation  matches,  and  combined  matches.  Each  of  these  match  predicates  can  be  used  to  instantiate 
the  M  of  Signature  Match,  the  general  signature  match  function  defined  in  Section  1. 


2.1.  Exact  Match 
Definition:  ( Exact  Match ) 

matcAg(r),  Tj)  =  3  a  sequence  of  variable  renamings,  V,  such  that 

V  r,  =  r, 

Two  function  types  match  exactly  if  they  match  modulo  variable  renaming.  Recall  that  variable  renaming 
may  rename  either  type  variables  or  user-defined  type  operators.  For  monomorphic  types  with  no  user- 
defined  types,  there  are  no  variables,  so  matched,  r?)  ==  (t>  =  rq)  where  n  and  r?  are  monomorphic.  We 
only  need  a  sequence  of  renamings  for  one  of  the  type  expressions,  since  for  any  two  renamings.  V]  and  \\ 
such  that  Vjri  =  we  could  construct  a  V'  such  that  V'ti  =  T2-  (Note  we  could  consider  matc/ig  as  a 
form  of  transformation  match  since  it  allows  variable  renaming.) 

For  polymorphic  types,  actual  variable  names  do  not  matter,  provided  there  is  a  way  to  rename  variables 
so  that  the  two  types  are  identical.  For  example,  rj  —  (a,  a)  — ►  bool  matches  rq  —  (/?,/3)  — »  bool  with  the 
substitution  V  =  [/3/a].  But  tj  =  a  — ►  0  and  r?  =  7  — »  7  do  not  match  because  ^nce  we  substitute  7 
for  a  to  get  7  — ►  0,  we  cannot  substitute  7  for  /?,  since  7  already  occurs  in  the  type  This  is  the  “right 
thing”  because  the  difference  between  rj  and  rq  is  more  than  just  variable  names;  rq  takes  a  value  of  some 
type  7  and  returns  a  value  of  the  same  type,  whereas  tj  takes  a  value  of  some  type  and  returns  a  value  of  a 
potentially  different  type. 

To  see  how  exact  match  might  be  useful  in  practice,  consider  two  examples  where  the  library  is  the  set 
of  all  functions  in  Figure  1.  Suppose  a  user  wants  to  locate  a  function  that  applies  an  input  function  to  each 
element  of  a  list,  forming  a  new  list.  The  query  rq  =  (a  — ►  7)  — *  a  list  —  7  list  matches  the  map  function 
(with  the  renaming  (7 //3]),  exactly  what  the  user  wants.  As  a  second  example,  suppose  a  user  wants  to 
locate  a  function  to  add  an  element  to  a  collection  with  the  query  r?  =  (a,  a  C)  — ' •  a  C.  This  query  retrieves 
the  «nq  function  on  queues  (with  the  renaming  [C/T]),  which  may  be  what  the  user  wants,  but  not  the 
insert  function  on  sets,  another  likely  candidate. 


2.2.  Partial  Relaxations 

As  we  have  just  seen,  exact  match  is  a  useful  starting  point,  but  may  miss  useful  functions  whose  types  are 
close  but  do  not  exactly  match  the  query.  Exact  match  requires  a  querier  to  be  either  familiar  with  a  library, 
or  lucky  in  choosing  the  exact  syntactic  format  of  a  type. 

Often  a  user  with  a  specific  query  type,  e.g.,  int  list  —  ini  list,  could  just  as  easily  use  an  instantiation 
of  a  more  general  function,  e.g.,  a  list  —*  a  list.  Or,  the  user  may  have  difficulty  determining  the  most 
general  type  of  the  desired  function  but  can  give  an  example  of  what  is  desired.  Allowing  more  general  types 
to  match  a  query  type  accommodates  these  kinds  of  situations.  Conversely,  we  can  also  imagine  cases  where 
a  querier  asks  for  a  general  type  that  does  not  match  anything  in  the  library  exactly.  There  may  be  a  useful 
function  in  the  library  whose  type  is  more  specific,  but  the  code  could  be  easily  generalized  to  be  useful  to 
the  querier.  We  define  generalized  and  specialized  match  to  address  both  of  these  cases. 

Referring  back  to  our  definition  of  generic  function  match,  for  exact  match,  the  relation,  R,  between 
types  is  equality.  For  parttal  matches  we  relax  this  relation  to  be  a  partial  order  on  types.  We  use  variable 
substitution  to  define  the  partial  ordering,  based  on  the  “generality”  of  the  types.  For  example,  a  — >  a 
is  a  generalization  of  infinitely  many  types,  including  int  — *  int  and  ( int,0 )  •  (int,/ 3),  using  the  variable 
substitutions  [in</a]  and  [(int,0)/a),  respectively. 
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r  is  more  general  than  r'  (r  >  r1)  if  the  type  r'  is  the  result  of  a  (possibly  empty)  sequence  of  variable 
substitutions  applied  to  type  r.  Equivalently,  we  say  r'  is  an  instance  of  r  (r'  <  r).  We  would  typically 
expect  functions  in  a  library  to  have  as  general  a  type  as  possible. 

Definition:  ( Generalized  Match) 

match qtn{n,rq)  =  n  >  rq 

A  library  type  matches  a  query  type  if  the  library  type  is  more  general  than  the  query  type.  Exact  match, 
with  variable  renaming,  is  really  just  a  special  case  of  generalized  match  where  all  the  substitutions  are 
variables,  so  ma<cAg(n,  f})  =>  match9en(ri.T7). 

For  example,  suppose  a  user  needs  a  function  to  convert  a  list  of  integers  to  a  list  of  boolean  values,  where 
each  boolean  corresponds  to  whether  or  not  the  corresponding  integer  is  positive.  The  user  might  write  a 
query  like  rq  =  (ini  — »  bool)  —*  int  list  — *  bool  list.  This  query  does  not  match  exactly  with  any  function  in 
our  library.  But  a  generalized  match  would  return  nap  for  this  query,  since  nap  s  type  is  more  general  than 
the  query  type.  This  kind  of  match  is  especially  desirable,  since  the  user  does  not  need  to  make  any  changes 
to  use  the  more  general  function. 

Definition:  ( Specialized  Match ) 

matchSpec(r),  rq)  =  n  <  rq 

Specialized  match  is  the  converse  of  generalized  match.  In  fact,  we  could  alternatively  define  matck,pec  in 
terms  of  matchjen  by  swapping  the  order  of  the  types:  match tf>ec{Turq)  —  match 9tn(rq.  n).  It  also  follows 
that  exact  match  is  a  special  case  ofi  specialized  match:  matched,  rq)  =>  match, pec  'n,  rq ) 

As  an  example  of  how  specialized  match  can  be  useful,  suppose  the  querier  needs  a  general  function 
to  sort  lists  and  uses  the  query  rq  =  ((a,  a)  — *  bool)  — ►  a  list  — ►  a  list.  Our  library  does  not  contain 
such  a  function,  but  specialized  match  would  return  iatsort,  an  integer  sorting  function  with  the  type 
Tj  =  ((int,  ini)  — *  bool )  — »  ini  list  — *  ini  list.  Assuming  iatsort  is  written  reasonably  well,  it  should 
be  easy  for  the  querier  to  modify  it  to  sort  arbitrary  objects  since  the  comparison  function  is  passed  as  a 
parameter. 

Note  that  although  we  present  generalized  and  specialized  match  in  terms  of  changing  the  relation  ( R ) 
between  t*  and  rqi  we  could  also  define  them  as  transformation  matches,  since  the  definition  of  the  <  relation 
on  types  is  in  terms  of  variable  substitution. 

Definition  (alternate):  ( Generalized  and  Specialized  Match) 

match}en{ri,  rq)  —  3  a  sequence  of  variable  substitutions,  U.  such  that 

matchE(U  T*,r,) 

match, pec(r/,  r?)  =  3  a  sequence  of  variable  substitutions,  U,  such  that 

matchg(ri,U  r?) 

We  can  even  define  match}en(.V ,  rq )  as  Urt  =  rq\  the  use  of  matchE  is  redundant  since  generalized  match 
requires  a  sequence  of  substitutions  that  includes  any  necessary  variable  renaming.  We  will  appeal  to  the 
above  matchs  definition  when  we  define  the  composition  of  different  kinds  of  relaxed  matches  (Section  2.4). 

Finally,  using  these  alternate  definitions  of  generalized  and  specialized  match,  we  can  define  type  unifi¬ 
cation  [FH88]  by  combining  these  two  relaxed  matches  and  allowing  the  renaming  to  occur  on  either  type. 

Definition:  ( Unify  Match) 

matchunify(ri,rq)  =  3  a  sequence  of  variable  substitutions,  U.  such  that 

maichE{U  nM  rq) 
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In  practice,  we  do  not  expect  unify  match  to  be  of  as  much  use  as  either  generalized  or  specialized  match, 
since  the  relation  between  types  rq  and  rj  is  more  complicated  with  unification.  However,  it  is  important  to 
relate  type  unification  and  type  matching,  i.e.,  the  former  is  definable  in  terms  of  the  latter. 


2.3.  Transformation  Relaxations 

Other  kinds  of  relaxed  match  on  functions  transform  the  order  or  form  of  parts  of  a  type  expression  to 
achieve  a  match.  Examples  include  changing  whether  a  function  is  curried  or  uncurried,  changing  the  order 
of  types  in  a  tuple,  and  changing  the  order  of  arguments  to  a  function  (for  functions  that  take  more  than 
one  argument).  These  last  two  are  the  same  since  we  can  view  multiple  arguments  to  a  function  as  a  tuple. 

For  example,  the  query  rf  =s  a  — *  a  list  —  a  list  would  miss  the  cons  function  because  r,  is  curried 
while  cons  is  not,  and  rq  =  ( a  list,  a)  —+  a  list  would  miss  cons  because  the  types  in  the  tuple  are  in  a 
different  order 


2.3.1.  Uncurrying  Functions 


A  function  that  takes  multiple  arguments  may  be  either  curried  or  uncurried.  The  uncurried  version  of  a 
function  has  a  type  (n, . . . ,  r„_t)  — ►  r„,  while  the  corresponding  curried  version  has  a  type  T\  — 

rn- 1  — ►  rn.  In  many  cases,  it  will  not  matter  to  the  querier  whether  or  not  a  function  is  curried.  We  define 
uncurry  match  by  applying  the  uncurry  transformation  to  both  query  and  library  types.  W'e  choose  to 
uncurry  rather  than  curry  each  type  so  that  we  can  later  compose  this  relaxed  match  with  one  that  reorders 
the  types  in  a  tuple. 


The  uncurry  tiunsformaiton,  which  produces  an  uncurried  version  of  a  given  type,  is  defined  as  follows: 


uncurry(r)  =  |  (ri.  -,Tn  i) 


r„  if  r  =  n  — 
otherwise 


Tn  „  |  *  Tn  .  fl  >  2 


The  uncurry  transformation  is  non-recursive;  any  nested  functions  will  not  be  uncurried.  We  also  define  a 
recursive  version,  uncurry*: 


{  (uncurry *(ti),  . . . ,  tmcurry  *(r„_i ))  —  uncurry*(r„)  if  r  =  r,  —  ...  —  r„_i  —  rn.n  >  2 
typeOp(uncurry*(ri), ....  un curry*(r„))  if  r  =  iypeOp(r\ , . . . ,  r„) 

r  where  r  is  a  variable  or  a  base  type 

For  example,  if  r  =  int  — *  int  — *  (ant  — »  ant  — *  bool)  —+  bool  then  uncurry(T)  =  (int .  int ,  (int  — *  inf  — 

bool))  —*  bool  and  uncurry*(r)  =  (ant,  int,  ((int,  int )  —>  bool))  — ►  bool. 


Definition:  (Uncurry  Match  and  Recursive  Uncurry  Match  ) 

matchuncurry(n ,  rq)  =  match E(uncurry(r\),  uncurry(rq)) 

matcAuncurry.(r(,  r,)  =  matchE(uncurry*(ri),  uncurry' (rq)) 

Uncurry  match  takes  two  uncurried  function  types  and  determines  whether  their  corresponding  argument 
types  match.  Recursive  uncurry  match  is  similar  but  allows  recursive  uncurrying  of  r/'s  and  r,’s  functional 
arguments.  By  applying  the  uncurry  (or  uncurry*)  transformation  to  both  r(  and  rq,  we  are  transforming 
the  types  into  a  canonical  form,  and  then  checking  that  the  resulting  types  are  equal  (modulo  variable 
renaming).  For  example,  suppose  a  user  needs  a  function  to  add  an  element  to  a  collection.  The  query 
rq  =  a  — *  a  T  — *  a  T  does  not  match  exactly  with  any  functions  in  our  library,  but  uncurry  match  would 
return  the  function  «nq  on  a  queue. 
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Since  the  uncurry  transformation  is  applied  to  both  the  query  and  library  types,  it  is  not  necessary  to 
define  an  additional  curry  match.  Such  a  match  would  be  similar  in  structure,  relying  on  a  curry  transforma¬ 
tion  to  produce  a  curried  version  of  a  given  type;  that  is,  matchCUTry{ri,  rq)  =  match  E(cvrry(r)),  curry(rq )). 
Note  that  match  r.tTTV{T\,  rq )  if  and  only  if  ma/cAuneurry(r),  rq). 


2.3.2.  Reordering  Tuples 

One  common  use  of  tuples  is  to  group  multiple  arguments  to  a  function  where  the  order  of  the  arguments 
does  not  matter.  For  example,  a  function  to  test  membership  in  a  list  could  have  type  (a, a  list)  — .  bcol  or 
type  (a  list,  a)  — *  bool.  Reorder  match  allows  matching  on  types  that  differ  only  in  their  order  of  arguments. 

We  define  reorder  match  in  terms  of  permutations.  Given  a  function  type  whose  first  argument  is  a  tuple 
(e.g.,  r  =  {r\, . . . ,  rn_i)  — »  r„),  a  permutation  a  is  a  one-to-one  mapping  with  domain  and  range  1  ...  n  —  1 
such  that  <r(r)  =  (r„(1), . . .  ,?•»<„_!))  —  rn. 


Definition:  ( Reorder  Match) 


maichreor<itr(n ,  rq)  =  3  a  permutation  a  such  that 
match B(<r(ri),  rq) 

Under  this  relaxation,  a  library  type,  rj,  matches  a  query  type,  rq ,  if  the  argument  types  of  n  can  be  reordered 
so  that  the  types  match  exactly.  For  this  match  to  succeed,  both  r/  and  rq  must  be  function  types  whose  first 
arguments  are  tuples.  Although  we  choose  to  apply  the  permutation  transformation,  <r,  to  the  library  type  r(, 
we  could  equivalently  apply  the  inverse,  tr~1 ,  to  the  query  type  rq:  maichB{cr{n),  rq )  =  matche(ri,  a~l{rq )). 

Suppose  we  again  are  looking  for  a  function  that  adds  an  element  to  a  collection.  To  find  it,  we  might 
pose  the  query,  rq  =  (a,o  T)  -*  a  T.  With  exact  match  we  would  find  the  enq  function  on  queues,  but  with 
reorder  match  we  would  additionally  find  the  insert  and  delete  functions  on  sets.  The  functions  enq  and 
insert  are  both  potentially  useful. 

There  are  two  variations  on  reorder  match:  we  can  allow  recursive  permutations  so  that  a  tuples  com¬ 
ponent  types  may  be  reordered;  and  we  can  allow  reordering  of  arguments  to  user-defined  type  operators, 
e.g.,  so  that  (inf,  a)  T  — *  inf  and  (a,  inf)  T  — *  inf  would  match. 


2.4.  Combining  Relaxations 

Each  relaxed  match  is  individually  a  useful  match  to  apply  when  searching  for  a  function  of  a  given  type. 
Combinations  of  these  separately  defined  relaxed  matches  widen  the  set  of  library  types  retrieved.  For 
example,  in  searching  for  a  function  to  add  to  a  collection,  uncurry  match  on  the  query  a  —  a  T  — *  a  T 
retrieves  anq  but  misses  insert.  To  retrieve  both  functions  with  this  query,  we  need  a  way  to  combine  the 
different  relaxed  matches. 

We  deliberately  gave  our  definitions  in  a  form  so  that  we  can  easily  compose  them.  Each  of  the  relaxed 
match  definitions  presented  in  Sections  2.2  and  2.3  (using  the  alternate  definition  of  matchgfn  and  match, pec) 
can  be  cast  in  the  general  form: 

3  a  pair  of  transformations,  T  =  (Ti,Tq),  such  that  match e(Ti(ti) ,Tq{rq)) . 


For  mafcAun<.urry ,  the  “3”  is  not  necessary,  since  there  is  only  one  possible  uncurry  transformation.  For 
matchgen  and  malchTC0T^er,  Tq  is  the  identity  function,  and  in  match, ptc,  Ti  is  the  identity  function. 
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The  match  composition  of  two  relaxed  matches,  denoted  as  (matchf  i  o  maichr 2),  is  defined  by  applying 

the  inner  (T2)  transformation  first: 

Definition:  ( Match  Composition).  Let  T1  and  T2  each  be  a  pair  of  transformations. 

(matchTi  o  maichrz)(ri,  Tq)  =  3  Tl,T2  such  that  match  e(T\i(T7:(ti)),  Tlq(T2<l(r<l ))) 


Thus  we  can  compose  any  number  of  relaxed  matches  in  any  order.  The  order  in  which  they  are  composed 
does  make  a  difference;  transformations  are  not  generally  commutative.  For  simplicity,  we  omit  the  recursive 
versions  of  matchunettrry  and  matchreorder ..  although  the  analysis  below  could  be  easily  extended  to  include 
them.  Since  match,ptt  and  matchuni/y  can  be  defined  in  terms  of  match }en,z  we  also  exclude  them  in  our 
analysis.  Thus,  there  are  three  “basic”  relaxed  matches:  matchgen ,  maicAunc Urry,  and  matchreorder  ■  We 
now  consider  some  of  the  interesting  combinations  of  these  relaxed  matches,  those  we  expect  queriers  to  find 
useful.  The  relations  between  the  various  combinations  of  relaxed  matches  lead  to  a  natural  partial  ordering 
relation  on  combined  matches,  based  on  the  set  of  function  types  that  a  match  defines  (namely,  the  set  of 
types  that  match  a  given  query  type). 


•  (malcAfeof^f  o  matchuncurr y)(n,  t^)  . 

With  this  composition,  two  types  match  if  they  are  equivalent  modulo  whether  or  not  they  are  curried 
or  whether  or  not  the  arguments  are  in  the  same  order.  We  uncurry  the  types  first,  thereby  allowing 
a  reordering  on  any  tuples  formed  by  uncurrying.  Using  this  composition,  the  query  type  r,  =  a  — 
a  T  — »  a  T  would  match  enq  (77  =  (a, a  T)  —  a  T)  on  queues  and  insert  and  delete  (rj  = 
(a  T,  a)  — *  a  T)  on  sets. 

•  (matchgtn  o  maichuncurry)(Tj tTq)  . 

77  and  rq  match  if  the  uncurried  form  of  r(  is  more  general  than  the  uncurried  form  of  Tq 

•  (match yen  O  matchreorder)(Tl ,  Ty)  . 

77  and  rq  match  if  some  permutation  of  77  is  more  general  than  r, . 

•  (match gen  ®  maichr eorder  ^  ma  )(n,r„)  : 

77  and  Tq  match  if  some  permutation  of  the  uncurried  form  of  77  is  more  general  than  the  uncurried  form 
of  rq.  Using  this  combined  match  with  the  order  of  77  and  r?  reversed  (i.e.,  using  match, pec  instead  of 
matchgen),  with  the  query  r,  =  (a  list, (a,  a  — ►  bool))  — »  o  list  matches  the  intsort  function  in  our 
library  (77  =  (int,  xnt  — »  bool)  — > ■  1  nilist  —*  intlist). 

Matchge„  does  not  commute  with  either  ma<chuncurry  or  matchTeorde r  because  in  either  case,  the 
variable  substitution  from  generalizing  could  introduce  a  type  that  could  then  be  transformed  by  u«- 
currying  or  reordering,  but  would  not  be  transformed  if  the  variable  substitution  is  done  last.  For 
example,  suppose  rq  =  (bool,  int)  — ►  (ml,  bool)  and  77  =  a  — »  a.  (match}en  °  matchr.„rder)(ri,Tq)  is 
false,  but  (maichr eorder  °  match  gtn)(Ti,rq)  is  true  with  the  substitution  ((ml,  6oo/)/a]  and  a  permuta¬ 
tion  that  swaps  the  order  of  a  2-element  tuple.  In  the  second  case,  we  can  apply  the  reordering  after 
we  have  substituted  into  type  expressions  that  contain  a  tuple. 


3.  Module  Matching 


Function  matching  addresses  the  problem  of  locating  a  particular  function  in  a  component  library.  However, 
a  programmer  often  needs  a  collection  of  functions,  e.g,.  one  that  provides  a  set  of  operations  on  an  abst  ract 


3n»«tc4,j,«(Tl'T»)  =  match  gm(rq,  T|),  and 
matchu„ifg(rt,Tq)  =  (ma<e4sen  o  match  ,PecHTl<Ti)  —  (match,  0  matcht,.„ )(rit  rq) 
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data  type.  Moat  modern  programming  language  explicitly  support  the  definition  of  abstract  data  types 
through  a  separate  modules  facility,  e.g.,  CLU  clusters,  Ada  packages,  or  C++  classes  Modules  are  also 
often  used  just  to  group  a  set  of  related  functions,  like  I/O  routines.  This  section  addresses  the  problem  of 
locating  modules  in  a  component  library. 

Recall  that  whereas  the  signature  of  a  function  is  simply  its  type,  r,  the  signature  of  a  module  is  an 
interface,  Z.  A  module’s  interface  is  a  pair,  (It -If),  where  It  is  a  multiset  of  user-defined  types  and 
If  is  a  multiset  of  function  types.'*  For  a  library  interface.  It  —  (Jit Jlf),  to  match  a  query  interface, 
Iq  =  {Iqt, Iqf),  there  must  be  correspondences  both  between  Iit  and  Iqt  and  between  Ilf  and  Iqf 
These  correspondences  vary  for  exact  and  relaxed  module  match. 


3.1.  Exact  Match 

Definition:  ( Exact  Module  Match) 

M-matchE(lL,lQ)  —  3  a  mapping  Uf  '■  Iqf  —  Ilf  such  that 

U F  is  one-to-one  and  onto,  and 
V  rq  €  Iqf  -  match  E(UE(rq).  r,) 

Uf  maps  each  query  function  type  rf  to  a  corresponding  library  function  type,  l!r(rq).  Since  U f  is  one-to- 
one  and  onto,  the  number  of  functions  in  the  two  interfaces  must  be  the  same  {t.e. ,  |Zz./-|  =  \Iqf\)-  The 
correspondence  between  each  rq  and  UE(rq)  is  that  they  satisfy  the  exact  function  match.  matchE.  defined 
in  Section  2.1.  That  is,  the  types  match  modulo  renaming  of  type  variables  and  user-defined  type  operators 

We  could  additionally  require  a  mapping  between  user-defined  types,  but  for  the  most  part,  matching 
function  type  matches  suffices,  since  for  r,  and  UE( r? )  to  match,  any  user-defined  types  must  match.  So 
any  user-defined  type  that  appears  in  the  domain  or  range  of  a  function  type  in  one  interface  must  match 
a  user-defined  type  in  the  other  interface-.  Not  having  a  separate  mapping  precludes  matching  where  one 
user-defined  type  in  Iqt  matches  more  than  one  user-defined  type  in  Ilt  ( or  vice  versa)  This  case  is  not 
likely  to  occur  in  practice  since  programmers  typically  define  only  one  user-defined  type  per  module. 


Iqt  =  {a  C) 

Iqf  =  {  unit  — ►  a  C, 

(a,  or  C)  —  a  C. 
a  C  — *  (a,  a  C), 
or  C  — *•  inf  ) 


Figure  2:  A  module  query 


Figure  2  is  an  example  query  that  describes  a  module  containing  the  definition  of  an  abstract  container 
type  and  a  set  of  basic  functions  on  the  container.  This  matches  the  interface  for  the  QUEUE  module  in 
Figure  1  with  the  the  obvious  mapping  from  function  types  in  Iqf  to  function  types  in  QUEUE.  Each  of 
the  exact  function  matches  renames  the  user-defined  type  operator  T  to  C. 

Exact  module  match  is  rather  restrictive.  We  define  two  forms  of  relaxed  module  •natch  by  ( 1)  modifying 
the  mapping  Uf  in  the  above  definition  and  (2)  replacing  the  definition  of  function  match.  matchE 


4  For  useful  feedback  to  the  user,  we  would  need  to  associate  names  with  the  function  types,  but  this  is  not  necessary  in  the 
definition. 
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3.2.  Partial  Match 


Should  a  querier  really  have  to  specify  all  the  functions  provided  in  a  module  in  order  to  find  the  module" 
A  more  reasonable  alternative  is  to  allow  the  querier  to  specify  a  subset  of  the  functions  (namely,  only  those 
that  are  of  interest)  md  match  a  module  that  is  more  general  in  the  sense  that  it  may  contain  functions  m 
addition  to  those  specified  in  the  query 

Definition:  (Generalized  Module  Match) 


M-matchfe„(lL,Iq)  is  the  same  as  M~matchr(lL,lq)  except  Ur  need  not  be  onto 

Thus  whereas  with  M-matcA£(l£,,I<j),  \Jlf\  —  \1qf\<  with  M-matchgen(lL,Iq).  \1lf\  >  \Iqf\.  and  Ilf  3 
Iqf  under  the  appropriate  renamings.  A  query  like  that  in  Figure  2  but  without  the  function  type  a  O'  —  in t 
would  match  QUEUE  under  the  generalized  module  match  definition. 

Definition:  (Specialized  Module  Match) 

M-matck,pec(lL,lq)  =  M-matchgen{lq.h ) 


With  specialized  module  match,  a  library  need  not  have  all  the  functions  defined  in  the  query.  As  with 
specialized  and  generalized  match  for  functions,  specialized  module  match  is  the  converse  of  generalized 
module  match. 


3.3.  Relax*  Match  (Using  Relaxed  Function  Matches) 

In  the  definition  of  exact  module  match  we  used  the  exact  match  predicate,  matchr .  to  determine  whether 
a  function  in  the  query  interface  matches  one  in  the  library  interface.  An  obvious  relaxation  is  to  use  a 
relaxed  match  on  functions  instead  of  exact  match 

Definition:  (Relax*  Match) 


M-matchrciazm(lL,Iq,Mr )  —  3  a  mapping  Ur  '■  Iqf  —  Ilf  such  that 

Ur  is  one-to-one  and  onto,  and 
V  r,  e  Iqr.Mr(Ur(Tg),  rg) 

The  only  difference  between  relax*  match  and  exact  module  match  is  that  relax*  match  uses  its  parameter, 
Adr,  to  match  functions,  instead  of  fixing  function  match  to  be  exact,  matchg  Thus,  exact  module  match 
is  trivially  defined  in  terms  of  the  above  definition;  M-matchE(lL*Iq)  —  M-matckrriaz. (Il,1q.  matchf ;). 
The  match  parameter  (Mr)  gives  us  a  great  deal  of  flexibility,  allowing  any  of  the  function  matches  defined 
in  Section  2  to  be  used  in  matching  the  individual  function  types  in  a  module  interface. 

What  this  definition  makes  clear  in  a  concise  and  precise  manner  is  the  orthogonality  between  function 
match  and  module  match. 


3.4.  Composition  of  Modulo  Mutches 

As  with  function  matches,  we  can  compose  module  matches  Since  specialized  module  match  can  be  defined 
in  terms  of  generalized  module  match,  we  need  only  consider  the  composition  of  generalized  module  match 


and  relax*  match.  The  order  of  the  composition  does  not  matter,  since  generalized  match  affects  the  mapping 
Vf  while  relax*  match  changes  only  the  function  match  used. 

Definition:  ( Generalized  Relax*  Match) 

M-maich)en-reiax^'lL^Q,MF)  is  the  same  as  M-matchr^xA^L.Iq  Mf)  except.  />  need  not 
be  onto. 

We  present  this  as  a  separate  definition  because  we  expect  this  combined  relaxed  match  to  be  the  most 
common  use  of  module  match  in  practice. 


Iqt  ~  C} 

Iqp  ~  {  unit  — *  a  C. 

(a,  oC)  -  a  C} 


Figure  3:  Another  module  query 

Figure  3  shows  another  example  of  a  module  querv.  This  query  contains  only  two  function  types  Under 
generalized  module  match,  this  query  would  match  only  the  QUEUE  module  (with  Uf  mapping  the  query 
functions  to  crsats  and  «nq).  Under  generalized  relax*  match,  using  function  reorder  match,  the  query 
matches  not  only  QUEUE  but  also  the  SET  module  (with  Ur  mapping  the  query  functions  to  create  ami 
insert  (or  delete),  and  reordering  the  input  arguments  to  insert). 


4.  An  Experimental  Signature  Matching  Facility 


We  have  integrated  a  signature  matching  facility  into  our  local  Standard  ML  (SML)  programming  environ¬ 
ment.  Our  current  implementation,  itself  written  in  SML,  supports  a  subset  of  the  function  matches  defined 
in  this  paper.  The  algorithms  for  generalized  and  specialized  match  are  modifications  of  Robinson’s  unifica¬ 
tion  algorithm,  as  presented  by  Milner  [Mil78j.  The  algorithms  for  the  other  matches  are  straightforward, 
but  in  some  cases  naive. 

We  used  our  facility  to  perform  queries  over  a  library  of  SML  code  consisting  of  3 1  modules  containing  245 
functions.  The  majority  of  queries  tested  take  less  than  016  seconds  (16  milliseconds)  to  complete.  Figure  4 
shows  some  actual  output  from  an  SML  command-line-based  version  of  the  implementation.  Type  notation 
is  from  SML:  *  is  the  tuple  constructor,  ->  is  the  function  constructor,  'a,  ’b  denote  type  variables  ( 1  ’a  is 
notation  for  equality  types,  and  ’  la  notation  for  reference  types;  we  do  not  distinguish  equality  or  reference 
types  in  the  current  implementation).  Using  exact  match,  the  query,  (’a  *  ’a  T)  ->  ’a  T,  returns  two 
matches:  adjoin  from  a  Set  implementation,  and  cons  from  a  lazy  stream  implementation.  Using  uncurry 
match  on  the  same  query  yields  an  additional  match:  the  anq  function  on  a  sortable  queue.  The  third 
query,  *a  list  ->  ’b  list  ->  (’a  *  ’b)  list,  is  drawn  from  a  real  use  of  our  system  by  a  coworkei  He 
needed  a  function  to  take  two  lists  and  create  a  list  of  pairs  of  elements  from  those  lists.  The  zip  function 
that  was  found  by  this  query  did  exactly  what  he  wanted;  moreover,  he  was  able  to  use  zip’s  code  in  his 
program  without  any  modification  An  example  that  gave  us  a  surprising  result  came  from  performing  the 
query  ’a  list  ->  'a  using  specialized  match.  We  expected  this  query  to  return  functions  such  as  one  that 
selects  an  element  from  a  list  (hd);  instead,  we  retrieved  the  implodePath  function  which  takes  a  list  of 
strings  that  represent  a  file  path  name  (e  g.,  [“usr”,  “amy",  “tex”]),  and  returns  a  string  of  the  entire  path 
name  (“usr/amy/tex”). 

Our  user  interface  is  simplistic:  just  gnu-emacs  [Sta86]  and  a  mouse.  The  user  pre-loads  a  specified 
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»  Qu« ry  »  (’a  *  ’a  T)  ->  ’a  T,  Hatcher  “  exact 

adjoin  :<(’’»  *  ’’a  T)  ->  ’’a  T)  [11  test/Set .a*l] 

cons  :((’la  *  ’la  T)  ->  ’la  T)  [16  t*«t/latrean.a»l] 

[CPU  ?in«:  0.01562S  sacs.,  Elapsad  Tina:  0  secs.]  (Objects  Found:  2) 

»  Query  •  (’a*  ’>  T)  ->  ’aT,  matcher  ■  uncurry 

adjoin  :<(”a  *  ’ ’a  T)  ->  ”a  T)  Cl  1  test/Sat. anl] 

•nq  :(’a  ->  (’a  T  ->  ’a  T))  [IS  test/SortableQueue.sal] 

cons  :((’la  *  ’la  T)  ->  'la  T)  [16  test/lstrean.snl] 

[CPU  Tina:  0.0  sacs.,  Elapsad  Tina:  0  secs.]  (Objects  Found:  3) 

»  Query  ■  ’a  list  ->  ’b  list  ->  (’a  *  ’b)  list,  aatcher  *  exact 
zip  :(’a  list  ->  (’b  list  ->  (’a  *  ’b)  list))  [11  test/ListFns.sal] 
[CPU  Tina:  0.01S625  sacs..  Elapsed  Tine:  0  secs.]  (Objects  Found:  1) 

»  Query  “  ’a  list  ->  ’a,  aatcher  *  specialize 

'mplodePath  : (string  list  ->  string)  [102  test /Pathname. sal] 

[[Substitute  string  for  ’a/"l]3 

[CPU  Tine:  0.01.S625  secs.,  Elapsed  Tine:  0  secs.]  (Objects  Found:  1) 


Figure  4:  Sample  Output  from  Function  Matching 


component  library.  The  emacs  command  is  similar  to  that  for  string  searches.  The  result  of  a  query  is  a 
list  of  functions  whose  types  each  matches  the  query,  along  with  the  pathname  for  the  file  that  contains 
the  function.  Clicking  the  mouse  on  a  function  in  the  list  causes  the  file  in  which  the  function  is  defined 
to  appear  in  another  buffer,  with  the  cursor  located  at  the  beginning  of  the  function  definition.  We  chose 
to  use  emacs  for  our  interface  rather  than  a  flashier  ^.aphical  user  interface  in  order  to  give  programmers 
easy  access  to  signature  matching  from  their  normal  software  development  environment.  Thus  we  achieve 
the  goal  of  having  signature  matching  as  easily  available  for  use  as  string  searching 


5.  Related  Work 


Closely  related  work  on  signature  matching  has  been  done  by  Rittri,  Runciman  and  Toyn.  and  Rollins  and 
Wing.  We  review  and  compare  our  work  with  theirs  below. 

Mikael  Rittri  defines  the  equivalent  of  match  reor<ier'  o  ma fcAuncurry  by  identifying  types  that  are  iso¬ 
morphic  in  a  Cartesian  closed  category  [Rit89],  He  has  also  developed  an  algorithm  to  check  for  more  general 
types  modulo  this  isomorphism  [Rit92],  the  equivalent  of  our  matchrco rjer*  o  maichuncurry>  o  matchgen.  He 
has  implemented  both  matches. 

Runciman  and  Toyn  [RT89]  assume  that  queries  are  constructed  by  example  or  by  inference  from  context 
of  use.  They  use  queries  to  generate  a  set  of  keys,  performing  various  operations  on  the  set  to  permit  more 
efficient  search.  The  match  they  ultimately  perform  is  similar  to  our  unify  match. 

Rollins  and  Wing  [RW91]  also  implemented  a  system  in  Lambda-Prolog  that  includes  the  equivalent 
of  maichreoritr’  °  matchuncurry  ■  They  also  extended  signature  matching  to  perform  a  restricted  kind  of 
specification  matching  (Section  6). 

Our  work  is  unique  in  two  ways.  First,  we  have  identified  a  small  set  of  primitive  function  matches  that 
can  be  combined  in  useful  ways.  Definitions  of  signature  matching  given  by  others  are  just  special  cases  of 
our  more  general  approach;  we  can  succinctly  characterize  their  definitions  (as  above)  and  do  so  in  a  common 
formal  framework.  We  support  orthogonality  of  concepts,  allowing  the  user  to  “pick  and  choose"  whichever 
match  is  desired,  perhaps  through  a  combination  of  more  primitive  matches.  Second,  all  previous  work  has 
focused  solely  on  matching  at  the  function  level.  We  extend  signature  matching  to  include  matching  on 
modules  as  well.  Moreover,  since  we  define  all  our  function  match  definitions  to  follow  a  common  form,  we 
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are  able  to  use  function  match  as  a  parameter  to  module  match. 

Less  closely  related  work,  but  relevant  to  our  context  of  software  library  retrieval,  divides  into  two 
categories:  text-based  search  and  classification  schemes.  In  text-baaed  search,  textual  information,  su'-h 
as  function  names  and  comments,  is  used  to  locate  desired  software  components.  Attempts  to  make  this 
approach  more  formal  involve  imposing  a  particular  structure  on  comments  to  increase  the  accuracy  of 
search.  One  example  of  such  a  system  is  REUSE  [AS86].  Information  about  a  component  includes  the 
component’s  name,  author,  and  language  processor  to  use. 

Finally,  some  work  on  classifying  software  enables  control  over  the  search  space.  A  classification  scheme 
primarily  facilitates  browsing,  as  the  goal  is  to  provide  a  structure  through  which  a  user  can  nav  igate  to 
locate  a  desired  module.  Prieto- Diaz  [PD89]  describes  a  method  of  classifying  software  using  facets  (e  g  , 
the  function  the  software  performs,  the  objects  manipulated  by  the  software,  the  medium  used,  the  type 
of  system,  the  functional  area,  and  the  setting  of  the  application).  The  REUSE  system  also  includes  a 
classification  structure  which  is  used  to  guide  search  through  a  menu  system. 

As  mentioned  in  the  introduction,  we  view  signature  matching  as  a  complementary  approach  to  these 
more  traditional  information  retrieval  techniques.  For  example,  a  classification  scheme  could  be  used  in 
conjunction  with  signature  matching  for  a  “pipelined”  query:  The  first  stage  of  the  pipeline  would  use  a 
classification  scheme  to  prune  the  search  space  for  the  second  stage,  which  would  use  signature  matching. 


6.  Summary  and  Future  Work 


This  paper  lays  the  foundation  for  the  use  of  signature  matching  as  a  practical  tool  for  the  software  engineer 
to  aid  in  the  retrieval  of  software  for  reuse.  We  present  precise  definitions  for  a  variety  of  matches  at  both  the 
function  and  module  levels.  Areas  for  further  work  include  evaluating  the  usefulness  of  signature  matching, 
defining  additional  relaxations,  and  going  beyond  signatures  to  specification  matching. 

We  plan  to  do  more  extensive  evaluation  using  our  signature  matching  facility  by  conducting  experiments 
with  real  users  from  our  local  SML  research  community  (which  includes  over  20  graduate  students,  staff, 
and  faculty).  This  evaluation  will  serve  two  purposes:  to  identify  places  for  performance  improvements,  and 
more  importantly,  to  provide  feedback  on  the  utility  of  signature  matching  for  software  reuse. 

The  existing  set  of  relaxed  function  matches  may  still  miss  some  potentially  useful  functions.  To  capture 
some  of  these  additional  matches,  we  will  need  to  expand  our  type  system  to  model  additional  characteristics 
of  functions.  Two  examples  of  such  characteristics  are  mutability  and  exceptional  behavior.  Even  in  “mostly" 
functional  languages  like  ML,  there  may  be  functions  that  mutate  objects.  Thus  if  there  are  two  functions 
that  perform  the  same  operation  but  one  does  so  by  creating  a  new  object  ({a,  a  T)-aT)  and  the  other 
by  mutating  am  input  object  ((a,  a  T)  — *  unit)  we  would  like  to  be  able  to  say  these  are  “the  same”  under 
some  relaxation.  Similarly,  we  would  like  to  be  able  to  match  functions  that  are  the  same  except  in  their 
behavior  under  exceptional  conditions. 

In  the  introduction  we  argued  that  signature  matching  is  an  instance  of  using  domain-specific  information 
to  do  search.  In  an  ideal  software  library,  domain-specific  information  would  include  not  just  signature 
information,  but  formal  specifications  of  the  behavior  of  each  component.  Given  such  specifications,  e  g.. 
pre-/post-  conditions  for  functions,  we  could  then  add  to  our  arsenal  of  search  tools  a  specification  matcher, 
using  specifications,  not  just  signatures,  os  search  keys.  Consider  the  query  {a  T,a  T)  — >  a  T  which 
matches  the  union,  intersection  and  difference  functions  on  sets.  Specification  matching  would  let 
us  distinguish  among  these  three  since  their  behaviors  differ  even  though  their  types  are  the  same.  We 
are  pursuing  specification  matching  in  the  contex-  of  Larch  [GH93]  and  Larch/ ML  [WRZ93]  at  Carnegie 
Mellon;  Stringer-Calvert  has  proposed  to  do  specification  matching  for  Ada  at  the  University  of  York  (SC93). 
Unfortunately  we  cannot  as  yet  expect  programmers  to  document  their  program  components  with  formal 
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specifications.*  Signature  matching  backs  off  from  this  more  ambitious  approach. 


Hence,  signature  matching  offers  the  greatest  amount  of  information  about  program  modules  for  the 
least  overhead.  We  can  exploit  information  that  programmers  already  must  generate,  i.e.,  function  types 
and  module  interfaces,  for  the  compiler.  (Thus  we  get  the  search  keys  for  free.)  Implementing  signature 
matching  requires  nothing  more  sophisticated  than  unification,  a  standard  algorithm  already  used  in  some 
compilers  to  do  type  inference.  In  return  we  get  a  useful  tool  for  retrieving  software  modules 
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1  Though,  if  we  provided  them  with  efficient  specification  matchers,  maybe  there  would  be  additional  incentive  to  write 
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